Semantic Annotation for Interlingual Representation of Multilingual Texts

نویسندگان

  • Teruko Mitamura
  • Keith Miller
  • Bonnie Dorr
  • David Farwell
  • Nizar Habash
  • Stephen Helmreich
  • Eduard Hovy
  • Lori Levin
  • Owen Rambow
  • Florence Reeder
  • Advaith Siddharthan
چکیده

This paper describes the annotation process being used in a multi-site project to create six sizable bilingual parallel corpora annotated with a consistent interlingua representation. After presenting the background and objectives of the effort, we describe the multilingual corpora and the three stages of interlingual representation being developed. We then focus on the annotation process itself, including an interface environment that supports the annotation task, and the methodology for evaluating the interlingua representation. Finally, we discuss some issues encountered during the annotation tasks. The resulting annotated multilingual corpora will be useful for a wide range of natural language processing research tasks, including machine translation, question answering, text summarization, and information extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ontology driven content extraction using interlingual annotation of texts in the OMNIA project

OMNIA is an on-going project that aims to retrieve images accompanied with multilingual texts. In this paper, we propose a generic method (language and domain independent) to extract conceptual information from such texts and spontaneous user requests. First, texts are labelled with interlingual annotation, then a generic extractor taking a domain ontology as a parameter extract relevant concep...

متن کامل

Semantic Annotation and Lexico-Syntactic Paraphrase

The IAMTC project (Interlingual Annotation of Multilingual Translation Corpora) is developing an interlingual representation framework for annotation of parallel corpora (English paired with Arabic, French, Hindi, Japanese, Korean, and Spanish) with deep-semantic representations. In particular, we are investigating meaning equivalent paraphrases involving conversives and non-literal language us...

متن کامل

Any-language frame-semantic parsing

We present a multilingual corpus of Wikipedia and Twitter texts annotated with FRAMENET 1.5 semantic frames in nine different languages, as well as a novel technique for weakly supervised cross-lingual frame-semantic parsing. Our approach only assumes the existence of linked, comparable source and target language corpora (e.g., Wikipedia) and a bilingual dictionary (e.g., Wiktionary or BABELNET...

متن کامل

Raising the Interlingual Ceiling with Multilingual Text Generation

In a typical interlingual machine translation (MT) system, the tasks of text planning and content selection are not explicitly performed. Rather, they are assumed to be implicit in the interlingual representation derived in the source language analysis phase. This simplifies the task of target language generation, but greatly restricts the flexibility of the resulting text. Recent MT systems ha...

متن کامل

Interlingual Annotation of Parallel Text Corpora: A New Framework for Annotation and Evaluation

This paper focuses on the next step in the creation of a system of meaning representation and the development of semantically-annotated parallel corpora, for use in applications such as machine translation, question answering, text summarization, and information retrieval. The work described below constitutes the first effort of any kind to provide parallel corpora annotated with detailed deep ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004